<html>
<head>
<title>GPTL usage example 1</title>
<meta name="example" content="Manual GPTL instrumentation">
<meta name="Keywords" content="gptl","papi","call tree","profile","timing","performance analysis">
<meta name="Author" content="Jim Rosinski">
</head>
<body bgcolor="peachpuff">

<hr />
<a href="example2.html"><img border="0" src="btn_next.gif"
			     width="100" height="20" alt="Example 2" /></a>

<a href="index.html">Back to GPTL home page</a>

  <br />

<h2>Example 1: Manual instrumentation of a simple OpenMP Fortran code</h2>
This is an OpenMP Fortran code manually instrumented with <b>GPTL</b> calls. The 
output produced by the embedded call to <b>gptlpr()</b> is shown and explained.
<p>
<b><em>simpleomp.f90:</em></b>
<pre>
<div style="background-color:white;">
program simpleomp
  use gptl
  implicit none
  integer :: ret, iter
  integer, parameter :: nompiter = 2 ! Number of OMP threads

  ret = gptlsetoption (gptlabort_on_error, 1) ! Abort on GPTL error
  ret = gptlsetoption (gptloverhead, 0)       ! Turn off overhead estimate
  ret = gptlinitialize ()                     ! Initialize GPTL
  ret = gptlstart ('total')                   ! Start a timer

!$OMP PARALLEL DO PRIVATE (iter)   ! Threaded loop
  do iter=1,nompiter
    ret = gptlstart ('A')          ! Start a timer
    ret = gptlstart ('B')          ! Start another timer
    ret = gptlstart ('C')
    call sleep (iter)              ! Sleep for "iter" seconds
    ret = gptlstop ('C')           ! Stop a timer
    ret = gptlstart ('CC')
    ret = gptlstop ('CC')
    ret = gptlstop ('B')         
    ret = gptlstop ('A')
  end do
  ret = gptlstop ('total')
  ret = gptlpr (0)                 ! Print stats
  ret = gptlfinalize ()            ! Clean up
end program simpleomp
</div>
</pre>

Compile and link, then run. 
<pre>
% gfortran -fopenmp simpleomp.f90 -I${GPTL}/include -L${GPTL}/lib -lgptlf -lgptl
% env OMP_NUM_THREADS=2 ./a.out
</pre>

The call to <b>gptlpr(0)</b> wrote a file named timing.0, which looks like this:

<pre>
<div style="background-color:white;">
Stats for thread 0:
             Called  Recurse     Wall      max      min
  total           1     -       2.000    2.000    2.000
    A             1     -       1.000    1.000    1.000
      B           1     -       1.000    1.000    1.000
        C         1     -       1.000    1.000    1.000
        CC        1     -    0.00e+00 0.00e+00 0.00e+00
Overhead sum =   1.8e-06 wallclock seconds
Total calls  = 5

Stats for thread 1:
            Called  Recurse     Wall      max      min
  A              1     -       2.000    2.000    2.000
    B            1     -       2.000    2.000    2.000
      C          1     -       2.000    2.000    2.000
      CC         1     -    0.00e+00 0.00e+00 0.00e+00
Overhead sum =  1.44e-06 wallclock seconds
Total calls  = 4

Same stats sorted by timer for threaded regions:
Thd      Called  Recurse     Wall      max      min
000 A         1     -       1.000    1.000    1.000
001 A         1     -       2.000    2.000    2.000
SUM A         2     -       3.000    2.000    1.000

000 B         1     -       1.000    1.000    1.000
001 B         1     -       2.000    2.000    2.000
SUM B         2     -       3.000    2.000    1.000

000 C         1     -       1.000    1.000    1.000
001 C         1     -       2.000    2.000    2.000
SUM C         2     -       3.000    2.000    1.000

000 CC        1     -    0.00e+00 0.00e+00 0.00e+00
001 CC        1     -    0.00e+00 0.00e+00 0.00e+00
SUM CC        2     -    0.00e+00 0.00e+00 0.00e+00

</div>
</pre>

<h3>Explanation of the above output</h3>
The output file displayed here leaves out preample information such as
how the GPTL library was built, name of the underlying timing routine, and so
forth. The statistics themselves begin with the line which reads "Stats for
thread 0:". The region names are listed on the far left. A "region" is
defined in the application by calling
<b>GPTLstart()</b>, then <b>GPTLstop()</b> for the same input (character
string) argument. Indenting of the names preserves parent-child
relationships between the regions. In the example, we see that region "A" was
contained in "total", "B" contained in "A", and regions "C" and "CC" both
contained in "B".
<p>
Reading across the output from left to right, the next column is labelled
"Called". This is the number of times the region was invoked. If any regions
were called recursively, that information is printed next. In this case there
were no recursive calls, so just a "-" is printed. Total wallclock time for
each region is printed next, followed by the max and min values for any
single invocation. In this simple example each region was called only once, so
"Wallclock", "max", and "min" are all the same.
<p>
Since this was a threaded code run with OMP_NUM_THREADS=2, statistics
for the second thread are also printed. It starts at "Stats for thread 1:" The
output shows that thread 1 participated in the computations for regions "A",
"B", "C", and "CC", but not "total". This is reflected in the code itself,
since only the master thread was active when start and stop calls were made
for region "total". 

<p>
After the per-thread statistics section, the same information is repeated, sorted by
region name if more than one thread was active. This section is delimited by
the string "Same stats sorted by timer for threaded regions:". This region
presentation order makes it easier to inspect for load 
balance across threads. The leftmost column is thread number, and the region
names are not indented. A sum across threads for each region is also printed,
and labeled "SUM".

<hr />
<a href="example2.html"><img border="0" src="btn_next.gif"
			     width="100" height="20" alt="Example 2" /></a>

<a href="index.html">Back to GPTL home page</a>

<br />

</html>
